Feature Selection Forcing Overtraining May Help to Improve Performance

نویسندگان

  • Enrique Romero
  • Josep M. Sopena
  • Gorka Navarrete
  • René Alquézar
چکیده

One of the main drawbacks of Machine Learning systems is the negative effect caused by overtraining. If the points in the dataset are perfectly fitted, the generalization performance is usually bad. We propose to take profit of overtraining, together with Feature Selection, to improve the performance of a learning system. The main idea lies in the hypothesis that when the dataset is as fitted as possible, the system is forced to use all the available variables as much as possible. Noisy and useless variables can be detected if generalization improves when the system is not allowed to use them. Forcing overtraining, noisy and useless variables should be more outstanding. In order to test this hypothesis, we performed several Feature Selection experiments using Feed-forward Neural Networks. The particular Feature Selection procedure used was Sequential Backward Selection. Experimental results with several real-world problems suggest that our hypothesis seems to be well-founded. Ironically, forcing overtraining may help to achieve good performance.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Fast SFFS-Based Algorithm for Feature Selection in Biomedical Datasets

Biomedical datasets usually include a large number of features relative to the number of samples. However, some data dimensions may be less relevant or even irrelevant to the output class. Selection of an optimal subset of features is critical, not only to reduce the processing cost but also to improve the classification results. To this end, this paper presents a hybrid method of filter and wr...

متن کامل

A Random Forest Classifier based on Genetic Algorithm for Cardiovascular Diseases Diagnosis (RESEARCH NOTE)

Machine learning-based classification techniques provide support for the decision making process in the field of healthcare, especially in disease diagnosis, prognosis and screening. Healthcare datasets are voluminous in nature and their high dimensionality problem comprises in terms of slower learning rate and higher computational cost. Feature selection is expected to deal with the high dimen...

متن کامل

A Parallel Genetic Algorithm Based Method for Feature Subset Selection in Intrusion Detection Systems

Intrusion detection systems are designed to provide security in computer networks, so that if the attacker crosses other security devices, they can detect and prevent the attack process. One of the most essential challenges in designing these systems is the so called curse of dimensionality. Therefore, in order to obtain satisfactory performance in these systems we have to take advantage of app...

متن کامل

Trainability of young athletes and overtraining.

Exercise adaptations to strength, anaerobic and aerobic training have been extensively studied in adults, however, young people appear to respond differently to such exercise stimulus in comparison to adults. In addition, because overtraining in young athletes has received little attention, this important area is also discussed. Resistance training in children can be safe and effective. It has ...

متن کامل

A Parallel Genetic Algorithm Based Method for Feature Subset Selection in Intrusion Detection Systems

Intrusion detection systems are designed to provide security in computer networks, so that if the attacker crosses other security devices, they can detect and prevent the attack process. One of the most essential challenges in designing these systems is the so called curse of dimensionality. Therefore, in order to obtain satisfactory performance in these systems we have to take advantage of app...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003